11 research outputs found

    Mining top-k regular episodes from sensor streams

    Get PDF
    International audienceThe monitoring of human activities plays an important role in health-care applications and for the data mining community. Existing approaches work on activities recognition occurring in sensor data streams. However, regular behaviors have not been studied. Thus, we here introduce a new approach to discover top-k most regular episodes from sensors streams, TKRES. The top-k approach allows us to control the size of the output, thus preventing overwhelming result analysis for the supervisor. TKRES is based on the use of a simple top-k list and a k-tree structure for maintaining the top-k episodes and their occurrence information. We also investigate and report the performances of TKRES on two real-life smart home datasets

    Mining High Utility Itemsets with Regular Occurrence

    Get PDF
    High utility itemset mining (HUIM) plays an important role in the data mining community and in a wide range of applications. For example, in retail business it is used for finding sets of sold products that give high profit, low cost, etc. These itemsets can help improve marketing strategies, make promotions/ advertisements, etc. However, since HUIM only considers utility values of items/itemsets, it may not be sufficient to observe product-buying behavior of customers such as information related to "regular purchases of sets of products having a high profit margin". To address this issue, the occurrence behavior of itemsets (in the term of regularity) simultaneously with their utility values was investigated. Then, the problem of mining high utility itemsets with regular occurrence (MHUIR) to find sets of co-occurrence items with high utility values and regular occurrence in a database was considered. An efficient single-pass algorithm, called MHUIRA, was introduced. A new modified utility-list structure, called NUL, was designed to efficiently maintain utility values and occurrence information and to increase the efficiency of computing the utility of itemsets. Experimental studies on real and synthetic datasets and complexity analyses are provided to show the efficiency of MHUIRA combined with NUL in terms of time and space usage for mining interesting itemsets based on regularity and utility constraints

    Mining top-k frequent/regular patterns based on user-given trade-off between frequency and regularity

    No full text
    International audienceFrequent-Regular pattern mining has been introduced to extract interesting patterns based on their occurrence behavior. This approach considers the terms of frequency and regularity to determine significant of patterns under user-given support and regularity thresholds. However, it is well-known that setting of thresholds to discover the most interesting results is a very difficult task and it is more reasonable to avoid specifying the suitable thresholds by letting users assign only simple parameters. In this paper, we introduce an alternative approach, called Top-k frequent/regular pattern mining based on weights of interests, which allows users to assign two simple parameters: (i) a weight of interest on frequency/regularity and (ii) a number of desired patterns. To mine patterns, we propose an efficient single-pass algorithm, TFRP-Mine, to quickly mine patterns with frequent/regular appearance. Experimental results show that our approach can effectively and efficiently discover the valuable patterns that meet the users' interest

    Mining top-k frequent-regular closed patterns

    No full text
    International audienceFrequent-regular pattern mining has attracted recently many works. Most of the approaches focus on discovering a complete set of patterns under the user-given support and regularity threshold constraints. This leads to several quantitative and qualitative drawbacks. First, it is often difficult to set appropriate support threshold. Second, algorithms produce a huge number of patterns, many of them being redundant. Third, most of the patterns are of very small size and it is arduous to extract interesting relationship among items. To reduce the number of patterns a common solution is to consider the desired number k of outputs and to mine the top-k patterns. In addition, this approach does not require to set a support threshold. To cope with redundancy and interestingness relationship among items, we suggest to focus on closed patterns and introduce a minimal length constraint. We thus propose to mine the top-k frequent-regular closed patterns with minimal length. An efficient single-pass algorithm, called TFRC-Mine, and a new compact bit-vector representation which allows to prune uninteresting candidate, are designed. Experiments show that the proposed algorithm is efficient to produce longer -non redundant- patterns, and that the new data representation is efficient for both computational time and memory usage

    Mining top-k periodic-frequent pattern from transactional databases without support threshold

    No full text
    International audienceTemporal periodicity of patterns can be regarded as an important criterion for measuring the interestingness of frequent patterns in several applications. A frequent pattern can be said periodic-frequent if it appears at a regular interval. In this paper, we introduce the problem of mining the top-k periodic frequent patterns i.e. the periodic patterns with the k highest support. An efficient single-pass algorithm using a best-first search strategy without support threshold, called MTKPP (Mining Top-K Periodic-frequent Patterns), is proposed. Our experiments show that our proposal is efficient

    Efficient mining top-k regular-frequent itemset using compressed tidsets

    No full text
    International audienceAssociation rule discovery based on support-confidence framework is an important task in data mining. However, the occurrence frequency (support) of a pattern (itemset) may not be a sufficient criterion for discovering interesting patterns. Temporal regularity, which can be a trace of behavior, with frequency behavior can be revealed as an important key in several applications. A pattern can be regarded as a regular pattern if it occurs regularly in a user-given period. In this paper, we consider the problem of mining top-k regular-frequent itemsets from transactional databases without support threshold. A new concise representation, called compressed transaction-ids set (compressed tidset), and a single pass algorithm, called TR-CT (Top-k Regular frequent itemset mining based on Compressed Tidsets), are proposed to maintain occurrence information of patterns and discover k regular itemsets with highest supports, respectively. Experimental results show that the use of the compressed tidset representation achieves highly efficiency in terms of execution time and memory consumption, especially on dense datasets

    Efficient mining Top-k regular-frequent itemset using compressed tidsets

    No full text
    International audienceAssociation rule discovery based on support-confidence frame-work is an important task in data mining. However, the occurrence frequency (support) of a pattern (itemset) may not be a sufficient criterion for discovering interesting patterns. Temporal regularity, which can be a trace of behavior, with frequency behavior can be revealed as an important key in several applications. A pattern can be regarded as a regular pattern if it occurs regularly in a user-given period. In this paper, we consider the problem of mining top-k regular-frequent itemsets from transactional databases without support threshold. A new concise representation, called compressed transaction-ids set (compressed tidset), and a single pass algorithm, called TR-CT (Top-k Regular frequent itemset mining based on Compressed Tidsets), are proposed to maintain occurrence information of patterns and discover k regular itemsets with highest supports, respectively. Experimental results show that the use of the compressed tidset representation achieves highly efficiency in terms of execution time and memory consumption, especially on dense datasets

    Mining periodic-frequent itemsets with approximate periodicity using interval transaction-ids list tree

    No full text
    International audienceTemporal periodicity of itemset appearance can be regarded as an important criterion for measuring the interestingness of itemsets in several application. A frequent itemset can be said periodic-frequent in a database if it appears at a regular interval given by the user. In this paper, we propose a concept of the approximate periodicity of each itemset. Moreover, a new tree-based data structure, called ITL-tree (Interval Transaction-ids List tree), is proposed. Our tree structure maintains an approximation of the occurrence information in a highly compact manner for the periodic-frequent itemsets mining. A pattern-growth mining is used to generate all of periodic-frequent itemsets by a bottom-up traversal of the ITL-tree for user-given periodicity and support thresholds. The performance study shows that our data structure is very efficient for mining periodic-frequent itemsets with approximate periodicity results

    Mining top-k regular-frequent itemsets using database partitioning and support estimation

    No full text
    International audienceTemporal regularity of itemset appearance can be regarded as an important criterion for measuring the interestingness of itemsets in several applications. A frequent itemset can be said to be regular-frequent in a database if it appears at a regular period. Therefore, the problem of mining a complete set of regular-frequent itemsets requires the specification of a support and a regularity threshold. However, in practice, it is often difficult for users to provide an appropriate support threshold. In addition, the use of a support threshold tends to produce a large number of regular-frequent itemsets and it might be better to ask for the number of desired results. We thus propose an efficient algorithm for mining top-k regular-frequent itemsets without setting a support threshold. Based on database partitioning and support estimation techniques, the proposed algorithm also uses a best-first search strategy with only one database scan. We then compare our algorithm with the state-of-the-art algorithms for mining top-k regular-frequent itemsets. Our experimental studies on both synthetic and real data show that our proposal achieves high performance for small and large values of k

    Habit monitoring over sensor streams

    No full text
    International audienceThe use of emerging sensing and communication technologies opens new opportunities for helping elderly or frail people continue living in their home. However, the raw data may be overwhelming for the supervisor. In this poster, we present ongoing work aiming at the discovery of habits from the analysis of the sensor data stream. Habits are searched in the form of episodes, that is to say collections of sensor events that exhibit temporal regularities. These regularities are assessed on combinations of interest measures, such as frequency, periodicity, maximal inter-occurrence gap, length, etc. We propose here structures and methods in order to discover and update over time the k most interesting episodes based on the user preferences on the measures of interest
    corecore